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CORRECTIONS 


p.  20,  line  +13:  ...directed  to  the  right... 

p.  39,  Figure  (9):  The  arrow  for  bQ  should  be  directed  upward. 

p.  43,  line  +1:  ...of  the  '©'  operator... 

p.  45,  line  +8:  colored  p  and  s,  respectively. 
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ABSTRACT 


A  mathematical  model  for  systolic  architectures  is  sug¬ 
gested  and  used  to  verify  the  operation  of  certain  systolic 
networks.  The  data  items  appearing  on  the  communication 
links  of  such  a  network  at  successive  time  units  are 
represented  by  data  sequences  and  the  computations  performed 
by  the  network-cells  are  modeled  by  a  system  of  difference 
equations  involving  operations  on  the  various  data 
sequences.  The  input/output  descriptions,  which  describe 
the  global  effect  of  the  computations  performed  by  the  net¬ 
work,  are  obtained  by  solving  this  system  of  difference 
equations.  This  input/output  description  can  then  be  used 
to  verify  the  operation  of  the  network.  The  suggested 
verification  technique  is  applied  to  four  different  systolic 
networks  proposed  in  the  literature. 
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Systolic  architectures,  pioneered  by  H.  T.  Rung,  are 
becoming  increasingly  attractive  due  to  continuous  advances 
in  VLSI  technology.  This  type  of  network  architectures  has 
two  properties  very  desirable  in  VLSI  implementations; 
namely,  regularity  and  the  local  nature  of  the  interconnec¬ 
tions. 

A  systolic  network  cam  be  viewed  as  a  network  composed 
of  a  few  types  of  computational  cells,  regularly  intercon¬ 
nected  via  local  data  links  amd  organized  such  that  streams 
of  data  flow  smoothly  within  the  network.  For  am  introduc¬ 
tion  to  systolic  architectures,  we  refer  to  [10]  where 
further  references  to  specific  examples  are  given. 

As  am  introductory  example,  wp  briefly  review  a  simple 
systolic  network  for  the  computation  of  one  dimensional  con¬ 
volution  expressions  [10].  More  specifically,  given  a 
sequence  of  numbers  [x^,  x^,  . ..  xn),  amd  a  sequence  of 

weights  (w^,  w^,  ...  w^},  we  wamt  to  compute  the  sequence 

[yi,  y2t  ...  yn+i_fc}  where  each  y^  is  defined  by: 

k 

wi  •  (1-1) 

Figure  1  shows  the  building  cell  of  the  1-D  convolution 
network  under  discussion.  It  is  a  multiply/add  cell  with  a 
one  word  memory  to  store  a  real  number  w. 
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At  each  clock  pulse,  the  cell  receives  two  input  data  items; 
x^n  and  yinr  performs  its  computation  and  delivers  at  the 

next  clock  pulse  the  outputs  xQ  -  xin  and 
yc  *  yin  +  w  *in.  Figure  2  shows  three  such  cells  con¬ 
nected  into  a  network  that  performs  the  convolution  calcula¬ 
tion  for  the  case  k-3.  The  elements  x.  ,  x, ,  ...  x_  are 

l  2  n 

pumped  in  at  the  left  end  of  the  network,  each  separated 
from  the  other  by  one  time  unit,  and  zeroes  are  pumped  in  ,at 
the  right  end.  To  illustrate  the  operation  of  the  array,  we 
show  in  figure  3  the  relative  location  and  value  of  each 
data  item  at  times  t-3,4,5  and  6,  where  t-1  is  the  time  at 
which  the  array  started  its  execution.  By  following  the  data 
paths,  we  can  convince  ourselves  that  the  output  of  the 
array  will  include  the  sequence  {ylf  Yi’  ••  ^n+l-k^' 

Although  the  concept  of  systolic  networks  is  very  well 
developed,  the  notation  used  to  describe  the  input  and  out¬ 
put  data  of  a  systolic  network  is  sometimes  ambiguous  and 
reflects  poorly  the  relative  timing  of  the  different  data 
streams.  Moreover,  no  rigorous  techniques  appear  to  be 
known  for  a  formal  verification  of  the  operation  of  such 
networks.  To  the  knowledge  of  the  authors,  there  has  been 
only  one  attempt  [6]  to  verify  formally  the  operation  of 
systolic  networks  based  on  a  proof  technique  used  in  the 
verification  of  distributed  systems  [4].  This  technique 
does  not  make  use  of  the  special  properties  of  systolic  net¬ 
works  and  hence  gives  only  rather  general  results. 
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In  this  paper,  we  suggest  a  technique  designed  specifi¬ 
cally  for  verifying  the  operation  of  systolic  networks.  In 
section  2.1  the  data  sequences  are  introduced  to  represent 
the  data  appearing  on  the  communication  links  at  successive 
time  intervals.  In  the  same  section,  we  discuss  the  causal 
operators  which  model  the  computations  performed  by  a  cell 
of  the  network.  This  concept  was  primarily  inspired  by 
corresponding  approaches  in  systems  theory  [7]. 

In  section  2.2  and  2.3,  we  present  the  mathematical 
model  on  which  the  verification  technique  is  based.  This 
model  carries  some  of  the  properties  of  a  model  called 
"automaton  networks”  [3]  which  in  turn  is  a  modification  of 
the  von  Neumann  cellular  array  [5,11].  However,  the  two 
models  have  more  differences  than  similarities,  and  are  used 
in  completely  different  contexts. 

In  section  3  we  describe  the  different  steps  of  the 
suggested  technique  and  give  a  simple  illustrative  example. 
Finally,  in  sections  4,5  and  6,  we  demonstrate  the  tech¬ 
nique  by  applying  it  to  the  verification  of  some  realistic 
systolic  networks  that  have  appeared  in  the  literature. 
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2.  An  abstract  systolic  modal. 

1.1.  Data  saouancas  and  causal  relations. 

We  define  a  data  sequence  to  be  an  infinite  sequence 
whose  elements  are  members  of  the  set  Rfl-R  u  (0),  where  R 

is  the  set  of  real  numbers  and  0  denotes  a  special  element, 
not  belonging  to  R,  called  the  "don't  care  element".  We 
extend  each  one  of  the  four  basic  arithmetic  operations  "op" 
'defined  on  R  to  Rfl  by  adding  the  rule  that  the  result  of 

any  such  extended  arithmetic  operation  on  Rfl  involving  0 

shall  equal  0.  That  is  if  'op'  -  '+' ,  ' ,  '*'  or  •♦',  then 

0  'op'  x  -  x  'op'  0  »  0  for  all  xeRfl 

Clearly,  operators  may  also  be  defined  directly  on 
Rg .  For  example,  we  will  consider  later  the  binary  operator 

©  such  that  for  any  x,ycRQ, 

x  ©  y  -  x  +  y,  if  x,y#0?  x©0-0©x-x  (2.1) 

Two  other  operators  that  will  be  used  in  section  6  are  the 
operators  minQ  and  maxQ  defined  on  an  ordered  pair  (x,y), 

x,ycR0  by 


roinfl(x,y) 


and 


f>min{x,y) 

y 


if  x,y#0 


if  x-fl  or  y-0 


maxQ(x,y) 


rmax{x,y ) 


if  x,y#0 


vx  if  x«0  or  y-8. 

where  min{}  and  max{}  carry  the  usual  meaning  on  R. 

Let  N  be  the  set  of  positive  integers,  then  any  data 
sequence  1  is  defined  as  a  mapping  from  N  to  R^?  that  is, 

the  image  element  v  ( i ) ,  i  cN ,  is  the  ith  element  in  the 
sequence.  The  set  of  all  data  sequences,  that  is  the  set  of 

all  such  mappings,  will  be  denoted  by  R*  -  {  i)  !  7/:N-R0}. 

* 

Any  arithmetic  operation  on  Rft  is  extended  to  Rft  by 

applying  the  operation  element-wise  to  the  elements  of  the 
sequences  with  0  being  the  result  of  any  undefined  opera¬ 
tion.  For  example,  if  'op'  is  a  binary  operation  defined  on 

* 

Rfl,  then  for  all  i^i^eRg,  we  have  i^’op'  v2  *  ^3  where 
for  all  ieN,  1)3  (i)  is  given  by 


'op'i?2(i) 


i?3(i)  -< 


v0 


if  7?3(i)  is  defined 


otherwise. 


We  will  alro  use  scalar  operations  on  sequences.  For 
example,  *  e  r  alar  product  of  a  sequence  ireR^  and  a  number 

w«R  is  defined  as  the  sequence  C  «  w  .11  cR*  for  which 
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C(i)  -  w  7j(i),  i eN . 

Given  the  previous  definition  of  data  sequences,  we 

—  jk 

define  the  set  of  bounded  data  sequences  Rfl  c  to  contain 

those  sequences  having  only  a  finite  number  of  non-0  ele¬ 
ments.  It  is  then  natural  to  introduce  the  termination 

function  T:R&-N  such  that  for  any  7j€Rfi,  T(t>)  is  the  posi¬ 
tion  of  the  last  non-0  element  in  v ;  in  other  words: 

for  any  7jeR0,T(7?)-i  •  7f(i)*0  and  n(j)"0  for  j>i. 

In  this  paper,  we  will  denote  bounded  data  sequences  by 
small  greek  letters  and  simply  refer  to  them  as  sequences. 
This  will  not  cause  any  confusion  because  we  will  never  con¬ 
sider  anything  but  bounded  data  sequences. 

In  addition  to  the  operators  extended  from  RQ  to  , 
we  may  also  define  operators  directly  on  Rfl.  In  general, 
an  n-ary  sequence  operator  r  is  a  transformation  r:[R0]n-Rfl 
where  [RQ ]n-R0xRflx . . . Rfl  is  the  cartesian  product  space  of  n 
copies  of  Rq .  Two  basic  unary  operators  that  will  be  fre- 

V 

quently  used  in  this  paper  are  the  shift  operator  n  and 
the  spread  operator  er  defined  by: 

nk<  -  V  and  eTe  -  c, 

where 


i>(i)  -  C(i-k)  i€N. 


C(i) 


id 


i«l,r+2,2r+3 


otherwise. 


(n-l)r+n,  . . . 


V 

More  descriptively,  n  inserts  k  O-elements  at  the  begin¬ 
ning  of  a  sequence,  while  er  inserts  r  O-elements  between 
every  two  elements  of  a  sequence.  For  example  if 
f -a1,  a2 ,  a3 ,  a^ ,  0 , 0 , . . .  then  T(4)-4  and 


€  ( i )  -  1-i-TU) 

3 

fi  £  “  0,0,0, a^ , a2 , a^ , a^ ,0,0,0,... 

2 

6  {  "  a^ ,0,0, a2 ,0,0, ,0,0, a^ ,0,0,... 


It  is  easy  to  verify  that  the  termination  function  generally 
satisfies 


T(nk£)  -  TU)+k 
T<erO  -  (r+l)T(  i  )  -r 

It  is  also  clear  that  we  can  define  a  sequence  operator 
by  combining  previously  defined  sequence  operators.  For 


example  we  might  define  an  operator  r : RflxR0xR0-*R0  as  fol¬ 
lows  : 


r U,V,C)  -  +  V  *  c] 


where  square  brackets  are  used  for  grouping  and  parenthesis 


for  enclosing  the  arguments  of  the  operator. 


We  next  define  a  causal  operator  to  be  any  n-ary 
sequence  operator  r:[Rfl]n-R0  which  satisfies  the  causality 

property  in  the  sense  that  the  ith  element  of  any  of  its 

operands  can  only  affect  the  jth  element  of  its  image  for 
j>i.  In  order  to  formulate  this  more  precisely ,  assume  that 

for  any  given  sequences  7?reRfl  r-1,2, — ,n,  the  image  under 

r  is  $-r  (i?^, .  .Ti z , . .  v  ) .  Then  r  is  a  causal  operator  if  by 

I 

replacing  any  operands  7>r  by  another  sequence  7?r  satisfying 

Vr(t)  «  7/r(t)  l*t<i 

the  resulting  image  ('  *  r(7?^,  .  .vz  r .  •7>n)  satisfies 
«'<t)  -  (( t)  Ht*i 

In  other  words,  the  value  of  {(i)  depends  only  on  the 
first  i-1  elements  of  r>r,  l^r<n. 

Similarly,  we  may  define  weakly-causal  operators  for 

a* 

which  the  i  element  of  the  image  sequence  (( i )  depends 

only  on  the  the  first  i  elements  of  the  operands  vr r  l*r«n 

instead  of  the  first  i-1  elements.  With  this,  it  is  easily 

12  2  1 

seen  that  the  combination  r  r  (or  r  r  )  of  a  causal 

1  2 
operator  r  and  a  weakly-causal  operator  r  is  a  causal 

V 

operator.  For  instance,  the  shift  operator  n  is  causal 
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and  the  spread  operator  ©r  is  weakly  causal;  hence,  the 

1c  f 

combined  operator  n  6  is  causal. 

2.2.  The  abstract  model. 

In  order  to  define  the  mathematical  model  used  in  our 
verification  technique,  we  define  as  usual  a  loop-less  mul¬ 
tigraph  G(V,E ,  #>+ )  to  be  composed  of 

(a)  a  set  V  of  nodes; 

(b)  a  set  E  of  directed  edges; 

(c)  two  functions  *_,*>+:E-V.  satisfying  the  condition 
that  for  any  edge  eeE, 

*_<«)  *  *+(e)  (2.2) 

For  each  edge  ecE,  the  nodes  *_(e)  and  #>+(e)  are  the 

source  and  destination  node,  respectively,  of  that  edge. 
Clearly,  the  condition  (2.2)  prevents  any  direct  loops  in 
the  graph.  This  definition  of  a  multigraph  allows  any  two 
nodes  to  be  connected  by  more  than  one  edge  in  the  same 
direction,  a  property  that  may  be  useful  when  we  represent 
systolic  networks  by  this  abstract  model. 

As  usual  in  graph  terminology,  for  any  node  vcV,  the 
edges  {e;w_(e)-v}  directed  out  of  v  are  termed  the  OUT 

edges  of  v,  while  the  edges  (e;<p+(e)-v)  directed  into  v  are 

termed  the  IN  edges  of  v.  Accordingly,  the  IN-degree  and 
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OUT-degree  of  v  are  the  number  of  IN  edges  and  OUT  edges  of 
v,  respectively.  Any  node  veV  with  IN-degree  zero  or  OUT- 
degree  zero  is  called  a  source  or  a  sink,  respectively.  All 
other  nodes  are  called  interior  nodes  of  G.  We  shall  use 
the  notation  V^,  VT  and  Vj  for  the  subsets  of  V  containing 

the  source,  sink  and  interior  nodes  of  V,  respectively.  Of 
course,  the  condition  v§  u  vt  u  vi  “  v  is  always  satis¬ 
fied. 

With  this  notion  of  a  multigraph,  we  define  our 
abstract  systolic  model  to  be  composed  of  the  following  com¬ 
ponents  . 

[Al]  A  multigraph  G(V,E,*>_,*+) . 

[A2]  A  coloring  function  col:E-CE,  which  maps  E  into  a 

given  finite  set  of  colors  C£,  and  hence  assigns  a  color 

to  each  edge  in  E.  The  coloring  function  is  assumed  to 
satisfy  the  condition  that  the  different  IN  edges  of  a  node 
have  different  colors,  and  correspondingly  that  the  dif¬ 
ferent  OUT  edges  of  a  node  have  different  colors.  Edge 
colors  y-col(e),  will  be  denoted  by  lower  case  letters. 

[A3]  For  each  edge  ecE,  a  sequence  fecRfl  is  specified. 

[A4]  For  each  interior  node  vcV  with  IN  degree  m  and  OUT 
degree  n,  we  are  given  n  causal  m-ary  operators  r^:[R0]m-R0 
which  specify  the  "node  I/O  description”.  More 


specifically,  if  tj^,  j-1,2, —  ,m  and  £*,  i-l,2,...,n  are 

the  sequences  associated  with  the  IN  and  OUT  edges  of  v, 
respectively,  then  the  n  relations 

£  ■  I*  y  (t?  ,n  )  i~l,  2 , . . .  ,n 

are  the  I/O  description  of  v.  The  different  IN  and  OUT 
edges  of  v  are  distinguished  in  the  I/O  description  by  their 
colors. 

Since  by  condition  [A2]  all  edges  terminating  at  a 
given  node  v  have  different  colors,  it  follows  that  any  edge 
ecE  is  uniquely  identified  by  a  pair  (y,v),  where  y«col(e) 
and  v-*>+(e).  To  simplify  the  notation,  the  pair  (y,v)  will 

often  be  written  in  the  form  yy,  and  the  sequence  associ- 

ated  with  that  edge  will  be  identified  by  the  symbol  i)y, 

where  we  replaced  the  letter  y  by  its  corresponding  greek 
letter  i) . 

For  practical  applications,  it  is  generally  desirable 
to  identify  the  nodes  of  the  network  by  appropriate  labels 
which  correspond  to  the  problem  at  hand.  This  means  that  we 
introduce  a  set  L  of  labels  together  with  a  one-to-one  func¬ 
tion  y:V-»L  from  V  onto  L.  In  our  examples,  we  usually 
identify  directly  the  nodes  with  their  labels. 

After  defining  the  general  abstract  model,  we  next 
show  how  it  can  be  used  to  define  a  general  systolic  net¬ 


work. 
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2.3.  Ifee  general  systolic  network. 

By  giving  a  physical  interpretation  to  each  component 
in  the  general  abstract  model  we  obtain  a  general  systolic 
network.  The  basic  idea  of  this  interpretation  may  be  sum¬ 
marized  as  follows: 

Each  interior  node  represents  a  computational  cell  and 
each  source/sink  node  corresponds  to  an  input/output  cell 
for  the  overall  network.  To  distinguish  in  our  figures  the 
Computational  cells  from  the  I/O  cells,  we  depict  computa¬ 
tional  cells  by  circular  nodes  and  I/O  cells  by  square 
nodes . 

Each  edge  xyeE  represents  a  unidirectional  communica¬ 
tion  link  between  the  two  cells  it  connects.  The  sequence 
associated  with  xy  then  comprises  the  data  items  that 

appeared  on  it  in  consecutive  time  units.  More  specifi¬ 
cally,  if  iv  is  the  sequence  associated  with  xy,  then  the 

ith  element  of  fy,  namely  *y(i)  is  the  data  item  that 
appeared  on  xy  at  time  t-i  units,  where  t-1  is  the  time  at 
which  the  network  started  its  operation. 

For  an  interior  node,  the  node  I/O  description 
describes  the  computation  performed  by  the  cell  correspond¬ 
ing  to  that  node.  We  illustrate  this  with  two  simple  exam¬ 
ples: 

EX  1:  The  node  shown  in  figure  4  represents  a  simple 


latch  cell  which  produces  at  any  time  t>l  on  its  output 
link  the  same  data  item  that  appeared  on  its  input  link 
at  time  t-1.  At  time  t-1,  we  have  7>(  l)-0,  which 

corresponds  to  the  fact  that  at  the  beginning  of  the 

network  operation,  no  specific  data  item  appeared  on 
the  output  link. 

EX  2:  The  operation  of  the  multiply-add  cell  mentioned 
in  section  1  and  shown  in  figure  1  may  be  represented 
by  the  following  node  I/O  descriptions: 

<o  -  n  <ln 

"o  “  n  (,ln  +  w  •  ‘in’  (2.3.b) 

where  wcR  is  a  given  real  number  and  f .  ,  7>,  ,  i  and 

m  in  o 

are  the  input  and  output  sequences  of  the  node  as 
shown  in  figure  5. 

Since  in  any  practical  dynamical  system  any  data  item 
produced  by  a  computational  cell  at  time  t  depends  only  on 
the  data  provided  to  that  cell  at  times  less  than  t,  we 
immediately  see  the  importance  of  the  condition  imposed  in 
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section  2.2  on  the  node  I/O  descriptions ,  namely  that  only 
causal  operators  in  the  sense  of  section  2.1  are  used.  We 
also  note  that  with  the  model  described  above,  the  computa-. 
tional  power  of  each  cell  is  not  limited  to  simple  arithmet¬ 
ical  operations.  In  other  words,  a  cell  could  be  an  intelli¬ 
gent  cell  that  can  perform  elaborate  calculations  provided 
that  we  can  express  these  calculations  in  terms  of  causal 
operators. 

We  call  "network  output  sequences”  those  sequences 
associated  with  the  IN  edges  of  sink  nodes,  and  "network 
input  sequences”  those  associated  with  the  OUT  edges  of 
source  nodes.  Then  the  system  of  all  node  I/O  descriptions 
provides  a  specification  of  the  computation  performed  by  the 
network  in  the  form  of  an  implicit  relation  between  the  net¬ 
work  input  and  output  sequences.  This  relation  will  be 
called  the  "network  I/O  description”. 

As  a  simple  example,  consider  the  hypothetical  network 
with  the  graph  shown  in  figure  6.  In  this  graph,  we  assume 
that  the  edges  directed  to  the  left  are  given  the  color  y 
and  those  directed  to  the  right  the  color  x.  We  also  follow 
the  naming  convention  mentioned  in  section  2.2  in  identify¬ 
ing  the  different  edges  in  the  graph.  To  complete  the  net¬ 
work  description,  a  node  I/O  description  has  to  be  specified 
for  each  node  in  the  graph.  Assume  that  these  are  given  by 
the  following  causal  relations: 

For  node  1:  ■ 


n  [  3 


(2.4  .a) 
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*0  -  n 

C  «1  *  i»l  ] 

(2.4.b) 

For 

node 

2: 

t3  -  n 

<2 

(2.5) 

For 

node 

3: 

-  n 

t  ^3  '  ^  1 

(2.6) 

ED¬ 

x2  /nodey  x3 

^7  node 

\  \Tr 

/node  \ 

IT) 

„  l  1 

A 

L  3  1 

m 

*0 

yj 

^ - S* 

z  l±J 

y3 

Figure  (6) 

For  this  network,  r>3  and  ^  are  the  network  input  sequences 

and  is  the  network  output  sequence.  In  order  to  obtain 

the  network  I/O  description  explicitly,  we  have  to  solve  the 
equations  (2.4),  (2.S)  and  (2.6),  that  is,  we  have  to  obtain 
an  explicit  expression  tor  in  terms  of  and  j»3  . 

Generally,  it  is  very  difficult,  and  sometimes  impossi¬ 
ble,  to  derive  an  explicit  solution  of  the  system  of  node 
I/O  equations.  In  the  next  section,  we  show  that  this  task 
may  be  greatly  simplified  in  the  case  of  certain  networks 
with  a  homogeneous  structure. 


IS 


1-  Homogeneous  Systolic  Networks. 

By  condition  [A2],  any  edge  ecE  is  uniquely  identified 
by  its  color  and  one  of  its  incident  nodes.  In  fact,  we 
used  this  already  as  a  convenient  means  for  identifying 
edges  by  their  color  and  terminal  node.  Let  M  c  c_xVT  be 

the  set  of  all  pairs  (y,v),  yeCg,  vcVj ,  for  which  there  is 

an  edge  ecE  with  y-col(e)  and  v-o_(e).  Then  the  terminal 

node  u«*+(e)  is  uniquely  given  and  hence  the  successor 

function  *i:M  "*  vj  u  vj>  is  well  defined  by  the  association 

(y,v)eM,  y-col(e),  v-<p_(e)  -  M(y,v)-*>+ (e) . 

In  other  words,  if  there  exists  an  edge  e  with  color  y  and 
starting  node  v,  then  a(y,v)  is  the  terminal  node  of  e. 

Given  a  systolic  network  based  on  the  graph  G* 

I 

{V,e,<p_,<p+}  ,  a  subset  VjG  Vj  of  interior  nodes  is  said  to 
be  a  homogeneous  set  if : 

[HI]  All  the  nodes  in  Vj  have  identical  IN  and  OUT 

degrees,  say  m  and  n,  respectively. 

[H2]  The  m  colors  of  the  IN  edges  of  any  interior  node 

I 

vcVj  are  identical.  So  are  the  n  colors  of  the  OUT 

edges  of  v.  Denote  the  colors  of  the  IN  and  OUT  edges 

of  v  by  y2,y2, . . . ,ym  and  z1,*2, . . . ,zn,  respectively. 
(H3 ]  The  node  I/O  descriptions  of  any  interior  node 


v«Vj  are  generic  in  the  sense  that  they  may  be  written 
in  the  form: 

CL  ,  -  i«l,  2 , . . . ,  n 

M(z,v)  v  v  v 

where  r*, 1*1,2, . . . ,  n  are  given  n-ary  operators  which 

are  independent  of  the  particular  node  in  vj ,  u  is  the 

successor  function  defined  earlier  in  this  section  and 

3*li  2|  i  •  •  fS  and  (  *  ialf  2 r  •  •  r n  ar e  the 

M(z\v) 

sequences  associated  with  the  IN  and  OUT  edges  of  v, 
respectively. 

A  network  is  said  to  be  homogeneous  if  the  set  of  inte¬ 
rior  nodes  in  its  graph  0  is  a  homogeneous  set.  More 

generally ,  if  there  exists  a  partition 

Vj  -  Vj  u  Vj  u  ...  u  vf  of  Vj  into  k  non-empty  homogeneous 

subsets  vi  •  vj  •  •  •  •  •  Vw  then  the  network  is  said  to  be  k- 
partially  homogeneous. 

The  main  advantage  of  having  a  homogeneous  (or  par¬ 
tially  homogeneous)  network  is  that  the  resulting  system  of 
equations  has  a  repetitive  pattern,'  which,  in  many  cases, 
allows  us  to  obtain  an  analytical  solution  to  the  system. 
This  should  become  clearer  as  we  proceed  with  the  different 
examples . 


To  verify  the  operation  of  a  systolic  network,  we  are 
generally  interested  in  its  behavior  for  specific  inputs, 
that  is  we  wish  to  find  the  form  of  the  network  output 
sequences  for  specific  network  input  sequences.  This  is 
usually  accomplished  by  substituting  the  given  input 
sequences  in  the  network  I/O  description  and  manipulating 
the  resulting  equations  to  obtain  the  description  of  the 
network  output  sequences. 

As  a  first  example  of  our  verification  technique,  we 
consider  again  the  1-D  convolution  network  described  in  sec¬ 
tion  1.  The  graph  of  this  network  is  shown  in  figure  7, 
where  we  assumed  that  the  edges  directed  to  the  left  have 
the  color  's’,  while  those  directed  right  have  the  color 
'p*.  The  nodes  of  the  graph  are  identified  by  the  integers 
-1,0, 1,2, . . . , k+l,k+2 ,  where  nodes  -1  and  k+2  are  source 
nodes,  nodes  0  and  k+1  sink  nodes,  and  nodes  1  through  k 
interior  nodes.  The  successor  function  is  defined  for  any 
interior  node  i»l,2,...,k  by 


k+1  sk+l  sk  sk-l  *1+1  so  s3  s2  S1 


Flaw.  (7) 


i+1 


if  y-s 


*t( y,i)  -  < 

'  i-1  if  y«p 

Our  goal  is  to  verify  that  the  network  indeed  produces  the 
results  of  equation  (1.1)  for  the  network  input  sequences 
described  by 


ox  -  nk_1  6 i  (3.1. a) 

*  0f  (3.1.b) 

where 

Td)-n-(k-l)  ,  t  (t ) “0 

T(f)-n  ,f(t)«xt 


The  I/O  description  of  a  typical  interior  node  i  in  the 
graph,  l*i»k,  is  given  by  the  following  causal  relations 


»  A  -  n  (3. 2. a) 

°i+l  "  n  t<7i  *  wi  *  ni  J  (3.2.b) 
This  system  of  difference  equations  is  easily  solved. 
First,  note  that  the  solution  of  (3. 2. a)  obviously  is 


n 


i 


-  n 


k-i 


n 


k 


By  substituting  this  m  (3.2.b)  we  obtain 


(3.3) 


°i+l  "  n  °i  +  wi  •  [  nX"1+1  nk  ]  (3.4) 

The  solution  of  (3.4)  is  then  given  by  lemma  1  in  the  appen¬ 


dix  as: 
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k 


-  "k  °X  *  B  «3'1  .ki 


This  is  the  I/O  description  for  the  network. 


(3.5) 


In  order  to  find  the  specific  form  of  the  output 
sequence  ak+^  for  the  input  sequences  (3.1),  we  substitute 

these  sequences  into  (3.5)  and  obtain 


°k,l  ■  n2k'1  9  ‘  +  twk-3  +  l  •  9  £) 

By  the  properties  Pi,  P2,  P3  and  P4  in  the  appendix,  this 
may  be  rewritten  as 


’k.l  -  n2k_1  9  ‘  *  n  J/0'1’  9  lwk-3  +  l  ' 


n2k  1  e  t  +  n  e  E^n3"1  twk_-|+i  •  «1 


n2k  1ei+ne  c 


wnere  )-TU)-n  and  y  j  ( t  >  “Wk-J  +1  {tc^“wk-j+l  xt' 

Finally,  applying  P5  of  the  appendix  we  find: 


ok+1  -  n21c~1  6  t  +  n  e  nk-1  y 


-  n2k_1  e  [i  t  j] 

-  n215'1  e  , 


where  y  is  defined  by: 


(3.6) 
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T(7? )  -  n-(k-l) 
k 

v(t)  «  E  7?  (t+k-j ) 

3-1  ^ 

k 

35lWk-3+l  xt+k-j 


l*t*T(7? ) 

l*t*T(7?  ) 


In  the  last 
3+1  m  order 


k 

E  v*  x 


q-1 


q  t+q-1 


l*t<T(7?) 


line,  the  summation  index  was  changed  to  q-k- 
to  provide  for  the  same  expression  a s  in  (1.1). 


Evidently,  equation  (3.6)  represents  the  output  of  the 
array  m  a  clear  and  precise  form;  it  indicates  that  after 
an  initial  period  of  2k-l  time  units,  the  elements 
7?(t)-yt,  l*t*n-(k-l),  will  appear  on  the  output  link,  each 

separated  from  the  other  by  one  time  unit. 


A  variation  of  the  above  1-D  convolution  network  may  be 
obtained  by  defining  the  I/O  description  of  each  node  in  the 
network  to  be  given  by  (3. 2.  a)  and  (3.2.b)  with  the  + 
operation  replaced  by  the  ©  operation  defined  by  (2.1).  By 
a  similar  analysis,  it  can  be  shown  that  the  output  of  the 
modified  network  is  described  by 


°k+l 


l'i  6  7?' 


where  T(7?')  -  n+k-1  and 
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j5iWk-i+1  xt-j+i 


-j^'Wl  Xt-j  +  1 


l*t*k-l 


k*t*n 


n+l*t*T (v  '  )  . 


In  the  previous  example  we  applied  our  technique  to  a 
homogeneous  network.  The  technique  is  equally  applicable  to 
k-partially  homogeneous  networks  if  k  is  reasonably  small. 
In  that  case,  a  system  of  difference  equations  is  formed  by 
writing  the  generic  I/O  description  for  a  typical  node  from 

each  homogeneous  subset  of  interior  nodes  V*  ,  i-1,2,..  , k. 

The  network  I/O  description  is  then  obtained  by  solving  chis 
system  of  equations.  The  back  substitution  network  and  the 
sorting  networks  discussed  in  sections  5  and  6  are  examples 
of  2-part ially  homogeneous  networks.  The  LU  decomposition 
network  described  in  [1]  is  a  4-partially  homogeneous  net¬ 
work  that  can  be  verified  by  the  same  technique. 

Finally,  we  note  that  the  explicit  derivation  of  the 
network  I/O  description  depends  on  our  ability  to  solve  the 
resulting  system  of  difference  equations.  However,  even  if 
these  equations  cannot  be  solved  explicitly,  we  may  still 
verify  the  operation  of  the  network  if  we  have  an  idea  about 
the  network  behavior  and  consequently  about  the  sequences  on 
the  different  edges  of  the  graph.  In  fact,  we  need  to  show 


-  25  - 

only  that  for  the  given  input  sequences,  the  expected 
sequences  satisfy  the  system  of  difference  equations.  We 
demonstrate  this  procedure  in  section  6  by  verifying  the 
operation  of  a  sorting  network  for  which  we  could  not  solve 
the  system  of  equations  explicitly. 
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4.  A  band  matrix  multiplication  network. 

In  [1],  Rung  and  Leiserson  suggested  a  systolic  network 
for  the  computation  of  the  product  of  two  band  matrices 
C-A*B,  where  both  A  and  B  have  lower  bandwidth  k^  and  upper 

bandwidth  k2  •  In  this  section,  we  shall  consider  only  the 

case  k1«k2“k  and  prove  formally  that  the  suggested  network 

indeed  produces  the  product  matrix  C.  Moreover,  the 
sequence  notation  used  in  the  verification  procedure  will 
provide  an  accurate  representation  of  the  I/O  data  includ¬ 
ing  the  input  timing  required  for  proper  operation  and  the 
timing  of  the  output  data. 

In  figure  8. a  we  show  the  directed  graph  of  the  matrix 
multiplication  network.  The  nodes  of  the  graph  are  regularly 
laid  out  so  that  each  node  can  be  labeled  by  a  pair  (i,j)  of 
integers,  where  i  and  j  are  the  relative  position  of  the 
node  with  respect  to  the  two  perpendicular  axes  shown  in  the 
figure.  The  set  of  colors  C£  has  three  elements,  namely 

p,  r  and  s,  and  the  coloring  function  col()  maps  the  edges 
directed  to  the  south-west,  south-east  and  north  to  the 
colors  p,r  and  s,  respectively. 

The  network  is  homogeneous;  it  consists  of  only  one 
type  of  computational  cell,  namely  the  multiply-add  type 
cell  shown  in  figure  8.b.  Its  generic  I/O  description  is 
given  by  the  causal  relations: 
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K! 

n 


5 

6 


*  n  pi,j 


*1-1. j  "  n  nL,J 


’i*l.i*l"  n^io+  pi.]‘  'lO1 


(4.1. a) 
(4 . l.b) 
(4.1.c) 


In  line  with  the  definition  of  homogeneous  networks,  this 
description  is  valid  for  any  cell  (i,j),  -k*i,j*k. 


As  an  illustration  of  the  network  topology  and  its  dif- 
ferent  data  streams,  we  show  in  figure  8.c  the  general  net' 
work  for  the  special  case  k*l,  that  is  for  the  case  of  twd 
tridiagonal  matrices  A  and  £.  In  the  figure,  the 
source/sink  cells  were  omited  for  clarity. 


In  order  to  obtain  the  I/O  description  of  the  network, 
we  have  to  solve  the  system  of  difference  equations  (4.1), 
and  express  the  network  output  sequences  o^  and  ^Jc+i 

-^k-lj*q*k+l  m  terms  of  the  network  input  sequences  p  .  , 

n  .  a  .  ,  and  o,  .  ,  -k*u*k.  For  this,  consider  first 

K  f  U  *Jv  t  U  U  p 

the  simple  equations  (4.1. a)  and  (4. l.b)  which  have  the 
solutions 


if  3 


n*-3  o 


i,k 


'i.J  ‘  nk'1  Vi 

By  substituting  these  values  into  (4.1.c)  we  obtain 


5i.i,j*i  '  n  toi,:  *  \.ji 


(4.2) 


where  A,  ^-n^^p..*n^i7r 

1  r  J  A  f  * 
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By  an  inductive  argument  similar  to  the  one  given  in 
the  appendix  for  lemma  1,  it  is  easily  shown  that  for 
-(k-1)  *  i,j  <k+l,  (4.2)  has  the  solution: 


r  ni+x  O 


k+i 


■i.j  "  1 


-k, j-i-k  +  ^ 


Jc+j 


i  *  j 


<  n^+k  a,  ^  .  .  +  z  nq  Ai 

-k  .  i-q»3- 


i  >  j 


With  the  definition  of  A.  .  and  properties  PI  and  P4  we 

1  •  J 

find  the  network  output  sequences  to  be 


i+k 


i  ,k+l 


ni"k  *  q^t  n2,_1  * 


n2q+k-i  . 

n  ffk,k-q+l^ 


k+1,  j 


i-q,k 

-(k-1) eiek+1 

n"'J  +  %  ‘  nJ,+k"3  ‘Wl.k 

,2q-l 


(4. 3. a) 


,k+j 


n 


n*' j-qJ 


-(k-l)*j*k 


( 4 . 3 . b) 


These  are  the  network  I/O  descriptions.  Of  course,  the 
network  is  not  expected  to  produce  the  elements  of  the  pro¬ 
duct  matrix  C  unless  the  elements  of  the  matrices  A-{a.  . ) 

A  *  J 

and  B-{b.  .  )  are  fed  into  the  proper  input  links  of  the 
1  >  J 

network  with  the  right  timing.  We  will  now  prove  that  the 
network  output  sequences  will  contain  the  elements  of  C  if 
the  input  sequences  are  specified  as  follows: 
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t 

»a.k  •  "2(,'+U,  • 

2‘u 

-k<u<k 

(4. 4. a) 

V.  ■  "2(1"u)  • 

2  *u 

-k*u*k 

(4 . 4 . b) 

n 

°-k,u-  «2<2,CtU, 

®2  ‘u 

-k*u*k 

(4.4.C) 

r ' 

®2  ‘u 

-k<uek 

(4.4.d) 

where 

T(i8u)-T(au)-n, 

T(  lu> 

•n-(k+u) ,  iu(t)-0 

• 

■ 

and  the  sequences  0^,  au  are 

defined  as  follows: 

For  u<0 

0 

lata-u 

s 

«u(t) 

at,t+u 

-u<ten 

(4. 5. a) 

•  \ 

0 

lete-U 

e 

Vfc)  -1 

bt+u,t 

-u<ten 

(4 . 5 .b) 

For  u*0 

* 

f  at,t+u 

1 

leten-u 

«u(t) 

(4. 5.c) 

0 

n-u<t*n 

'* 

VtJ  - 

bt+u,t 

1 

l*t*n-u 

(4 . 5 . d ) 

w 

'  0 

n-u<ten 

Roughly  speaking ,  the 

input 

link  Pfc>u  »  -k<u«k. 

contains 
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the  uth  off-diagonal  of  the  matrix  B,  while  the  input  link 
r  .  ,  -ksuek,  contains  the  (-u)tl1  off  diagonal  of  the 

U  f  K 

matrix  A.  Of  course  the  exact  timing  of  the  input  data  is 
defined  by  the  formulas  (4.4). 

For  the  sake  of  breviety,  we  cosider  here  only  the 
equations  (4. 3. a)  and  show  that  the  output  links  s^  k+^, 

-(k-l)ai*k+l  will  carry  the  elements  in  the  lower  band  of 
the  product  matrix  C«A*B,  including  the  diagonal.  By  a 
similar  procedure,  one  can  use  (4.3.b)  to  show  that  the 
links  8k+1  ^  ,  -(k-l)ajak  will  carry  the  upper  band  of  C. 

By  introducing  the  specifications  (4.4)  of  the  network 
input  sequences  into  (4. 3. a),  we  obtain  for  -(k-l)«i*k+l 
the  following  formula: 


*i,k+l-  li  +  ®2  “i-q  *  n5k'i  +  2  02  *k-q*l> 

-  7.  ♦  n21c+2i_1  T  e2  «1-q  *  n3 (k-i+i)  02  ^k  q+i 


7.  +  n2k+2i-l  02  ™  ^  *  n*-i  +  l  Bk.q+1] 

a  " 


where  -  n51c-i  +  2  ©2  With  property  P7  the  product 

term  becomes 


„  -  7  +  02k+2i-l  .2  kpi  _k-i+l  q 

°i,k+l  "  li  +  n  q5l  1 


(4.6) 


where  T(>^)  -  n-(k-i+l)  and 


w 
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*i(t)  "  ai-q(t**"i+1)  *  *k-q+l(t) 

Simplifying  (4.6)  and  using  the  definition  of  we  find 

that 


„  .  n5k-i+2  _2  .  n5k-i  +  2  A2  ***  q 

°i,k+l"  n  0  4l-i  +  n  6  £  yi 

q  i 

-  n5k“i+2  e2  [iw  + 
where  T(i?i)  -  n-(k-i+l)  and 


Jc+i 

U,(t)  -  E  7?(t)  ~ (k-1) aiak+1 

1  q-1  1 

k+i 

-  E^  a^gtt+k-i+l)  *  +  - (*-1) ei-k+1 

(4.7) 

Finally,  from  the  definition  of  t1_^-we  obtain  that 


„  ,  n5k-i+2  _2 

°i,k+l"  0  9  vi 


- (k-1) *i*k+l  (4.8) 


Equation  (4.8)  describes  the  timing  of  the  output  data 
on  any  link  &i  fc+1,  -(k-l)*i*k+l.  It  indicates  that  on 

s i  }c+^»  there  will  be  an  initial  set  up  time  of  5k-i+2 

units,  after  which  the  elements  ^(t),  t-1,2, . .  ,n-(k-i+l) 

will  appear  separated  each  from  the  other  by  two  time  units. 
We  still  need  to  show  that  )  "ct+k-i+ 1  t'  that  i8  that 

8i  k+1  carries  the  (fc“i+l)8t  sub  diagonal  of  the  matrix  C. 
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To  evaluate  v  i(t)  from  (4.7) ,  we  use  the  definitions 

(4.5)  to  write  (t+k-i+1)  and  for  the  values 

of  t  between  1  and  n-(k-i+l) ,  which  are  the  values  of  t 
assumed  in  (4.7).  The  resulting  formulas  are: 


<*u(t+d)-  { 


v*>-  ) 


0 

if 

u<0 

and 

lataq- (k+1 ) 

a(t , 

-i»q) 

if 

u  <  0 

and 

q-(k-fl)  <t*n-d 

(4.9 

•  a) 

a(t , 

-  i,q) 

if 

u*0 

and 

latan+q- (k+1) 

0 

if 

u*0 

and 

n+q- (k+1) <tan- 

d 

0 

if 

v<0 

and 

lataq- (k+1) 

b(t , 

q) 

if 

v<0 

and 

q-(k+l) < t*n-d 

(4.9 

.b) 

b(t , 

q) 

if 

v*0 

and 

l*t *n+q- (k+1 ) 

0 

if 

v»0 

and 

n+q-(k+l) <tan- 

d 

where,  for  simplicity,  we  introduced  the  notation 
u  -  i-q,  v  -  k-q+1,  d  -  (k+l)-i, 

-  at+d(t+j+u  «"'J  b(t,q)  -  btty  t. 

which  will  be  used  repeatedly  in  the  remainder  of  this  sec¬ 
tion. 


It  is  clear  from  (4.9)  that  the  evaluation  of  ^(t)  by 

(4.7)  is  non-trivial  and  depends  on  the  relative  values  of  i 
and  q.  For  this  purpose,  we  consider  two  different  cases: 


In  this  case  and  fox  l*q*k+i,  the  inequalities  u-i-q<0  and 
v-k-q+leO  always  hold.  Moreover,  we  have  q-(k+l)*0  and 
n+q-(k+l)  >n-d.  Accordingly,  we  can  use  the  above  condi¬ 
tions  to  determine  the  appropriate  values  of  <*u(t+d)  and 

ly(t)  from  (4.9),  and  with  these  in  (4.7)  we  obtain  the 

formula: 

k+i 

iJi(t)  -  E  at+d , t+k+l-q  bt+k+l-q,t  Htsn-d 
q-1 

By  changing  the  summation  index  to  j-t+k+l-q  this  is  indeed 
t+k 

V<(t)  -  E  *t+t1  -  b  t  l*t*n-d  (4.10) 

1  j-t+d-k  c+a'J  J'c 

Case  2:  Xf  l«*A*k+l. 

In  this  case  we  always  have  u-i-q  *  v-k-q+1.  Accordingly, 
we  divide  the  sum  in  (4.7)  into  the  three  partial  sums 

k+i  i  k  k+i 

F  -  F  +  F  +  F 

q-1  q-1  q-1+1  q-k+1 

For  simplicity,  we  refer  to  these  three  sums  as  1^,  Z2  and 
Z 3,  respectively,  and  evaluate  them  separately. 

i 

In  the  case  of  £.  -  £  ?“(t),  we  note  that  the  condi- 

1  q-i  1 

tion  lsqai  implies  that  v»u»0.  Hence,  by  (4.9)  we  have 
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r  a(t, i,q)  b(t ,q) 


y*(t)  -  < 


if  l*t*n+q-(k+l) 


0  if  n+q- (fc+1) <t*n-d 

By  standard  rules  of  operations  with  summation  symbols,  E^ 

can  be  expressed  as 


f  E  a(t, i,q)  b(t,q) 
q-1 


r,  -  ( 


i 

E  a(t,i,q)  b(t,q) 
q-t-n+k+1 


if  l*t*n-k 


(4.11) 


if  n-k<t*n-d 


We  turn  next  to  I 


,  E  V?(t). 
*  q-i  +  1  1 


u<0*v,  q-(k+l)<0  and  n+q-(k+l) >n-d. 
f  o 1 lows  that 


In  this  case,  we  have 


Hence,  from  (4.9)  it 


y jj (t )  -  a(t,  i,q)  b (t , q )  Ht*n-d 

which  gives  directly 

k 

E-  -  E  a(t , i ,q)  b(t ,q )  l*t*n-d  (4.12) 

c  q-i+1 

Finally,  in  the  case  of  E^  the  inequality  u*v<0  holds. 


Therefore,  we  have 
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r 

& 


i 

p 


0 

*i<t)  -  < 

*•  a(t,i,q)  b(t ,q) 
which  gives 


k+t 

/  E  a(t,i,q)  b  ( t ,  q ) 
!  q-k+1 

I 


k+i 

E  a(t , i  ,q)  b(t ,q) 
q-k+1 


if  l*t*q-(k+l) 


if  q- (k+X) <t*n-d 


if  l*t*i 


(4.13) 


if  i <t*n-d 


Now  ^(t)  is  obtained  by  adding  the  sums  (4.11), 

(4.12)  and  (4.13)  on  three  different  intervals  for  t.  This 
sum  is  given  by 


k+t 

,  £  a (t , i ,q )  b(t ,q)  l*t*i 

;  q-i 
I  k+i 

vA t)  E  a(t,i,q)  b(t,q)  i<t*n-k 

!  <3-1 

! 

j  k+i 

'  E  a(t,i,q)  b(t,q)  n-k<t*n-d 

q-t-n+k+1 


By  changing  the  summation  index  to  j-t+k+l-q  and  sub¬ 
stituting  the  appropriate  values  for  a(t,i,q)  and  b(t,q)  we 


obtain 


t+k 

f  3El  at+d , j  bj,t 


Vi(t) 


t+k 

j-t+d-k  *t+d'i  b3  »t 
n 

attd'3 


l*t*i 


i<t*n-k 


n-k<t*n-d 


Note  that  the  above  formula  for  7?i(t)  is  valid  for 

l<i<k+l  while  (4.10)  is  valid  for  -(k-l)*i*0.  These  two 
formulas  are  equivalent  to  those  resulting  from  multiplying 
the  two  band  matrices  A  and  B,  which  proves  that  for 
t-1,2, . . . ,n-(k-i+l)  and  -  (k-1)  *  i«k+l ,  we  have  indeed 


77i(t)  "  ct+d ,  t  *  ct+k-i+l,t ' 
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5.  A  back  substitution  network. 


In  this  section,  we  apply  our  verification  technique  to 
a  systolic  network  that  contains  two  different  types  of  com¬ 
putational  cells,  namely  the  back-substitution  network  sug¬ 
gested  in  [8].  This  network  performs  the  back  substitution 
operation  to  solve  the  linear  system  cf  equations 

L  u  *  y  (5.1) 
where  L  is  an  nxn  non-singular,  banded,  lower  triangular 
matrix  with  the  band  width  k+1,  and  y  is  a  given  n- 
dimensional  vector.  The  solution  of  the  system  (5.1)  is 
given  by  the  formula: 


u . 

i 


*i  /  1i,i 


i-1 

(yi  "  ^  1i , i-j  Ui-3}  /  1i,i 


(yi  •  E  1i  i-i  ui-i>  /  1 
i  ]ml  iri  3  ID 


i,i 


i-1 


2*  i  *k 


k<  i*n 


.  th 


where  1.  is  the  (i,3)  element  of  the  matrix  L,  and  y. 

i ,  D  1 

and  Uj^  are  the  itn  elements  of  the  vectors  y  and  u,  respec¬ 


tively. 


Figure  9  shows  the  graph  of  the  suggested  network.  It 
is  a  2-partially  homogeneous  network,  composed  of  k 
multiply/add  (M/A)  type  cells,  and  one  subtract/divide  (S/D) 
cell.  The  computational  cells  are  labeled  by  integers  such 
that  the  cells  1  through  k  are  of  the  M/A  type,  and  the  cell 
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0  is  the  S/D  cell.  As  for  the  I/O  cells,  we  must  be  careful 
to  assign  labels  to  the  sink  cells  because  these  labels  will 
be  used  to  identify  the  network  output  links.  The  labels 
given  to  source  nodes  are  immaterial  as  they  do  not  affect 
the  verification  procedure,  and  consequently  are  not  shown 
in  figure  9. 

In  the  regular  layout  shown  in  figure  9,  the  edges 
directed  to  the  south,  north,  east  and  west  are  given  the 
colors  a,b,r  and  s,  respectively.  The  set  Vj  of  interior 

nodes  in  G  is  divided  into  two  homogeneous  subsets  Vj*{0) 
2 

and  Vj*{  i  :  i-1, 2  , . .  .  ,k}  .  The  operation  of  the  cell 
represented  by  node  'O'  is  described  by  the  causal  relation 

Px  -  n  [  [0Q  -  oQ3  +  aQ]  (5.2) 

and  the  operation  of  any  M/A  cell  represented  by  a  node  i, 


l*i*k,  is  described  by  the 

generic  I/O  description 

pi*i  •  n  pi 

i *1 r  2  f • • • t k 

(5. 3. a) 

o1_1  «  ntOj  9  cii  Pil 

i*l f  2 , • •  •  ,k 

(5.3.b) 
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r* 


s 


« 


L 


where  the  0  was  defined  by  (2.1). 

To  solve  the  system  of  difference  equations  (5.2), 
(5.3.a/b),  we  first  write  the  solution  of  (5. 3. a)  as 

PL  -  ni_1  px  l<i-k+l  (5.4) 

from  which  we  find  that 

<w  n*  pi  <5-5) 

Substitution  of  (5.4)  into  (5.3.b)  then  gives 

o._1-  n  [ai  0  Ai]  (5.6) 

where  -  or  *  (fli-1  p^.  Using  am  inductive  argument 

similar  to  that  in  the  appendix  for  the  proof  of  lemma  1,  we 
can  show  that  the  solution  of  (5.6)  is 

°o  -  ak®  nj  taj  *  ftj_1  <5*7) 

X 

where  £'  is  defined  by  £'  7?  -  tj.  0  tj,  @  — .  ©  tjv. 

J-l  3  1 

For  given  p^,  the  network  output  sequence  Pjj+1  is 

easily  obtained  from  (5.5).  The  next  step  will  be  to  elim¬ 
inate  oQ  from  (5.2)  and  (5.7)  and  to  obtain  p1  explicitly 

in  terms  of  the  network  input  sequences  o^,  and 

j-0,1, . . . ,k.  Unfortunately,  if  we  try  to  solve  (5.2)  and 
(5.7)  simultaneously,  we  will  obtain  a  recursive  equation  in 
p^,  which  is  very  difficult  to  manipulate  in  general.  For 
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H 

« 

I 

{I 


«  nk  0  [  i  ©  ^  [x  *  n3  n  J 
j-i  3 

where  we  used  property  P2  to  interchange  n23  and  e.  If  in 
addition  we  let 

*  ■» 

y  -  i  ©  Es  [x,  *  n3  t  ]  (5.11) 

j-1  3 

then  we  can  substitute  for  Og  and  in  (5. 9. a)  and  obtain 

nk+1  e  (  -  n  [[nk  0  v  -  nk  0  y]  +  nk  e  xQ] 
which  reduces  to 

<  -  [t?  -  y]  +  XQ  (5.12) 

For  an  explicit  description  of  the  sequence  y,  we  need 
to  examine  (5.11)  more  closely.  We  start  by  applying  pro¬ 
perty  P7  to  the  product  term  in  (5.11),  namely 

*  n3  <  -  n3  ^ 

where 

T ( m j  )  -  min{  T (  X^  )  - j  ,  T ( <  ) )  *  n-j  (5. 13. a) 

and 

(t )  -  X^(t+j)  *  4(t)  (5.13.b) 

This  enables  us  to  rewrite  (5.11)  as 

k  i 

y  -  1  ©  EN  fl3  Hj.  (5.14) 

3-1  3 
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From  (5.14)  and  the  definition  of  the  ' + '  operator,  we  con¬ 
clude  that  T(y)  -  max(T(i)  ,  T(M^)+j)  ■  n,  and  conse¬ 
quently  from  (5.12)  that 

TU)  -  min{T(7?)  ,  T(7)  ,  T(XQ)}  -  n. 

Using  this  in  (5. 13. a)  we  easily  see  that  T(m.j)  -  n- j . 

Now,  we  apply  property  P6  to  (5.14)  and  explicitly  describe 
y  by 

T(7)  -  T( i )  -  n 

and 


y  (t ) 


0 

t-i 

E  <s(t-j) 

j-i  3 

k 

L  E  M.(t-j) 
3-1  3 


t-l 

t* 2 , 3  , . . . ,k 

t«k+l,k+2 , . . . ,n 


Finally,  with  these  specific  descriptions  of  r>,  XQ  and 

y,  we  directly  find  the  explicit  form  of  the  sequence  £  in 
(5.12)  to  be 

«(t)  -  (??(t)  -  7 ( t ) )  /  XQ(t) 


that  is 
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€(t) 


Y-  /  1 


t ,  t 


t-1 


(y,  -  E  /  1 


3-1 

k 


Lt,t 


t-1 


2*t«k 


E  f(t-j) 

3-1 


"t,t 


k+l*t*n 


A  comparision  of  this  expression  with  the  formula  given 
in  the  beginning  of  the  section  for  the  solution  of  (5.1) 
shows  readily  that 


O  —  0  £ 

Pk+1  0  9  * 


where  T(£)  -  n  and  £(t)  -  ufc . 
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6 .  A  sorting  network 

The  sorting  network  [2,9]  described  here  accepts  an 
indexed  set  X-{x^ , . . . , x^}  of  k  different  real  numbers, 

x^eR,  ieK*{l, . . . ,k} ,  and  produces  as  output  the  same 

numbers  sorted  in  ascending  order.  Pigure  10  shows  the  gen¬ 
eral  graph  of  the  network  and  the  labels  given  to  each  node. 
In  the  figure,  the  edges  directed  to  the  right  and  left  are 
colored  s  and  p,  respectively. 

For  any  jcK,  let  y^,...,y^  be  the  result  of  sorting 

the  j  elements  x^,...,x.  of  X  in  ascending  order.  Then  for 

all  (i,j)  of  D-{ ( i , j ) eKxK ;  l*i*j*k},  the  ranking  function 
f  x  :D-*X  is  defined  by 

With  this,  we  will  prove  that  if  the  network  input 
sequence  is  given  by 


Figure  (10) 
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”k  “  ©  *  (6.1) 

where  T(£)  *  k  and  4(t)  -  x^t  then  the  network  output 
sequence  ok+^  has  the  form 

ok+1«  n2k_1  ©  V  (6.2) 

where  T(t;)  -  k  and  i?(t)  -  fx(t,k). 

The  network  considered  in  figure  10  is  a  2-part ially 
homogeneous  network.  The  cell  labeled  '1'  is  a  simple  latch 
cell  whose  operation  is  described  by 

a2  m  n  ni  (6. 3. a) 

while  the  I/O  description  of  the  cells  i-2,...,k  is  given  by 

-  n  max0(vi,ai)  (6.3.b) 

ai  +  1  -  n  min0(iri,ai)  (6.3.c) 

where  maxfl  and  minfl  were  defined  in  section  2.1.  In  other 

words,  the  cells  i-2,...,k  are  comparision  cells  which 
operate  as  follows:  At  any  time  t,  if  neither  one  of  the  two 
inputs  oi(t)  or  ^(t)  is  a  don't  care  element  0,  then  the 

cell  compares  the  two  inputs,  and  produces  as  output  at  time 
t+1,  the  largest  and  the  smallest  numbers  on  the  links 

and  8i+1  respectively.  However,  if  any  of  the  inputs  is  0, 

then  the  cell  acts  as  a  simple  latch  cell,  that  is,  if 


oi(t)-6  or  t)«0  then 


► 
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Wi_l(t  +  1)  -  ^(t)  and  °i  +  j,(t  +  1)  *  ^(t) 

To  obtain  the  network  I/O  description,  the  system  of 
equations  (6.3.a/b/c)  should  be  solved  for  However, 

the  recursive  nature  of  (6.3.b)  and  (6.3.C)  makes  this  very 
difficult,  if  not  impossible.  One  possible  alternative  is 
to  suggest  a  tentative  value  for  the  sequences  n  and  o^, 

and  then  to  verify  that  these  suggested  solutions  indeed 
satisfy  (6.3).  Of  course,  any  assumed  value  for  v i  should 

reduce  to  the  input  sequence  (6.1)  for  i-k. 

Let  us  assume  that  n ^  and  are  given  by 

nL  -  nk-i  6  aL  laiak  (6. 4. a) 

ai  -  nk+1"2  9  2*i«*k+l  (6 .4  .b) 

where  Tfa^)  -  T(0i)  -  k, 

,  X¥  lata i 

«i(t)  -  ' 

v  max{xfc, fx(t-i,t-l) )  i<tak 

and 

fx(t,t+i-2)  latak+l-i 

tf^t)  -  < 

*  fx(t,k) 


k+l-i<tak 


48 


r 


n 


* 

B 


It  is  very  easy  to  verify  that  (6. 4. a)  reduces  to  (6.1) 
for  i«k.  Hence,  our  next  step  will  be  to  check  that  (6.4) 
does  satisfy  (6.3).  For  i«l,  (6. 4. a)  reduces  to 

n ^  •  nk-1  0 

where  T(a1)-k,  and 


’  xt 

t-1 

«1(t)  -  ' 

‘  maxfl  { xt , 

fx(t-l,t-l))  1<  t*k 

Since 

is  the  maximum  element 

in 

{*l,x2/ • • •  »  ^ j 

},  it 

follows  that  xx-f  (1, 1) 

and 

max0(xt, fx(t-l,t-l) )-fx(t ,t) .  Hence,  we  may  write 

al(t)  “  fx(t,t)  1-t-k 

But  from  (6.4.b),  we  obtain  for  i*2 

o2  -  nk  e 

where  T(£2)  -  k  and  02(t)  "  *x(trt)»  lstsk,  which 
proves  that  $ 2  ■  at^,  and  hence  m  n 

The  next  step  is  to  show  that  (6.4)  does  satisfy 
(6.3.b).  For  this,  we  sustitute  (6.4)  into  the  right  hand 
side  of  (6.3.b)  and  denote  the  resulting  sequence  by  p. 
This  gives 
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p  -  n  max0(nk_1  e  «i  ,  nk+i~2  e  2* i*k 

Using  property  P2  to  interchange  and  0  in  the 

second  operand  of  max0  we  obtain 

p  -  nk'(l_1)  0  >i  (6.5) 

where  y ^  •  maxQ{ai  ,  ^1-  By  definition  of  max^,  it 

follows  that  T(yA)  -  Ttor)  -  k,  and 

aA(t)  let* i-1 

max{ai (t ) (t-i+1) )  i-l<t*k 

Hence  with  the  definitions  of  cr(t)  and  ^(t)  we  obtain 

xt  l^tei-l 

>i(t)  -  max{xt  ,  fx(t-i+lrt-l) }  t-i 

max{max{xt  f  fx(t-i,t-l)}  ,  fx (t-i+1, t-1) ) 

i<t*k 

Because  of  max{  max{a,b}  ,  c}  -  max{a,b,c},  and 
fx(t-i,t-l)  <  f  (t-i  +  1, t-1) ,  we  may  rewrite  y i  as 

■  xfc  l*te i-1 

>A(t)  -  < 

max{xt  ,  f x (t- ( i-1) , t-1) }  i-l<t*k 

from  which  we  find  that  >A(t)  •  «i_1(t),  and  hence,  by 
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(6.5)  and  (6. 4. a),  that  p  -  This  proves  that  (6.3.b) 

is  satisfied  for  the  values  of  o^  and  n ^  given  by  (6.4). 

Finally,  to  check  that  (6.4)  does  satisfy  (6.3.c),  we 
substitute  (6.4)  into  (6.3.c)  and  denote  the  resulting 
sequence  by  r.  This  gives 

r  -  n  min0(nk_i  0  «i  ,  fik+1_2  9  0 i)  2*i*k 

-  nk_i  +  1  9  min0(a.  ,  ni_1  iSi} 

In  view  of 

min0(ai  ,  ni_1  0  ^  -  fi1-1 

where  T(^)  -  T(0i)  -  k  and 

min{cr  (t+i-1) ,0^ (t) )  l*t«k-(i-l) 

*>A(t)  - 

*A(t)  k-(i-l)  <t*k 

we  wr ite 


nk+(i+l)-2  Q  ^ 


(6.6) 


Prom  (6.6)  and  (6.3.c),  it  follows  that  r  -  +  1  only 

it  <p.  m  0..,.  To  prove  this,  we  substitute  the  definitions 
1  1  +  i. 

of  ai(t+i-l)  and  ^(t)  into  <Pi(t)  and  obtain 
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min{max{xt+i_1  ,  f x(t-l,t+i-2) }  ,  fx(t,t+i-2)} 


^(t)-  • 

fx(t,k) 

But  from  lemma  2  in 
fx(t,t+i-l)  -  fx(t,k) 


l*t*k- ( i-1 ) 
k- ( i-1) <  t*k 

the  appendix,  and  the  fact  that 
for  t-k-i+1,  we  may  write  *>^(t)  as 


f  f  (t,t+i-l)  l*t*k-i 

fx(t,k)  k-i<t^k 

It  follows  that  <Pi(t)  *  +  and  therefore  that 

r  -  ai+i*  This  completes  the  proof  that  the  sequences  n i 
and  oi  of  (6.4)  indeed  satisfy  the  system  of  equations 
(6.3)  . 


Now  that  (6.4.b)  is  known  to  be  a  valid  formula  for  the 
sequence  a^,  we  can  easily  obtain  the  network  output 

sequence  ok+1  by  setting  i-k+1.  This  gives 


k+1 


-  n2*-1  e  * 


k+l 


where  Tx.  };+1)  -  k  and  £k+1<t)  -  fx(t,k), 


l*t*k  which  is 


identical  with  the  expected  output  sequence  (6.2). 
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7.  Concluding  Remarks: 

This  work  was  meant  to  contribute  to  the  area  of  sys¬ 
tolic  architectures  in  three  different  ways,  namely,  by  pro¬ 
viding  a  mathematical  model  for  systolic  networks,  an  unam- 
bigious  description  of  its  input  and  output  data,  and  a 
technique  for  the  verification  of  its  operation. 

The  central  concepts  in  the  present  model  are  those  of 
data  sequences  and  sequence  operators.  Although  we  only 
defined  the  few  operators  that  were  used  in  the  examples,  it 
should  be  clear  that  other  sequence  operators  may  be  intro¬ 
duced  to  model  other  types  of  computational  cells. 

A  further  step  in  this  area  is  to  develop  a  more  com¬ 
plete  sequence  algebra  to  provide  a  basis  for  a  solvability 
theory  of  the  resulting  system  of  difference  equations  on 
sequences.  More  specifically,  it  would  be  desirable  to 
determine  under  which  conditions  an  explicit  analytical 
solution  for  the  system  of  difference  equations  can  be 
obtained.  For  a  given  network,  this  might  determine,  the 
properties  to  be  satisfied  by  the  successor  function  a  and 
the  node  I/O  operators  in  order  to  verify  analytically  the 
operation  of  the  network.  If  a  sufficiently  flexible  alge¬ 
bra  of  this  type  were  available,  our  model  might  prove  to  be 
very  powerful  in  the  design  of  new  systolic  networks. 

At  this  point,  we  note  that  even  if  we  cannot  solve  the 
resulting  system  of  equations  analytically,  we  can  still  use 


53 


a  numerical  iterative  procedure  to  solve  it.  This  approach 
is  very  close  to  the  simulation  of  systolic  networks,  but 
appears  to  be  more  general  and  systematic. 

Finally,  we  note  that  throughout  this  paper  we  assumed 
the  systolic  network  to  operate  synchronously.  However,  the 
same  model  and  techniques  can  be  used  for  asynchronous  net¬ 
works.  The  only  difference  is  in  the  interpretation  of  the 

ith  element  of  a  data  sequence,  which  now  has  to  denote  the 

ith  data  item  that  appeared  on  a  communication  link  instead 
of  the  data  item  that  appeared  on  that  link  at  time  t-i. 
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Appendix 

In  the  first  part  of  this  appendix,  we  list  some  pro¬ 
perties  of  sequence  operators  that  have  been  used  in  the 
paper.  The  verification  of  these  properties  is  straight 
forward  from  the  definitions  of  the  operators  involved.  In 
the  second  part  of  the  appendix,  we  prove  two  lemmas;  the 
first  gives  an  analytical  solution  to  a  difference  equation 
that  appears  frequently  in  the  verification  of  networks  con¬ 
taining  multiply/add  cells,  while  the  second  one  proves  an 
equality  that  was  needed  in  section  6. 

Let  4,  4  and  7?..  j-0, 1, 2,  . . .  ,k  be  sequences  in  RQ ,  and 
weR ;  then 

Property  Pi:  nr  nk  4  -  nr+k  4 

Property  P2:  n('r  +  1)k  er  4  -  0r  nk  4 

Property  P3 :  w.[ek4]*ek[w.4] 

W  .  [  nr  *1  -  nr  [  W  .  4] 

Property  P4:  For  any  binary  operator  'op' 

nk  [4  'op*  4]  -  nk  4  'op' 

er  U  'op*  4]  -  er  4  ’op- 


extended  from  R, 


flk  4 


er  4 


to  R^ ,  we  have 


Property  P5 :  If  i) ^  j-l,2,...,k  are  such  that  T(v> 

E  n3'1  -  nk_1  v 

j-1  J 

k 

where  T(tj)  -  n-(k-l)  and  v(t)  •  E  ^(t+k-j). 

j-1  3 

The  next  result  uses  the  ©  of  (2.1): 

Property  P6:  Let  the  sequences  y  y  j-0,1,.. 

* 

T(i?j)  «  n- j  *  then 

U0  ©  n  vl  ©  *2  ©  •  •  •  ©  nk  7?k  - 

where  T(?)  -  n  and 


y  (t) 


t-1 

f  E  H-tt-j) 
j-o  3 


k 

L  E  Vt-j) 

j-o  3 


t-l r  2  r  *  *  •  f 


t-k+l,k+2 , 


Property  P7 :  Given  {,CeR0f  then 

f  *  nr  t  -  nr  y 
where  y  is  described  by 

T(v )  -  roin(T(<)-r  ,  T(O)  and  y(t)  -  <(t+r)  * 


Lemma  J.:  The  difference  equation 

°i+l  "  n  °i  +  Ai  i-1,2, . . . ,k+l 


.  )  -n ,  then 


,k  satisfy 


y 


•  •  f  n 


«(t> . 


(a.  1) 


has  the  solution 


1 


S  R 


P  i 


«  — 


P 
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nr_1  o,  +  £  n3_1 

1  j-i 


ra2  r  3  r  • • • ,k+l. 


(a. 2) 


r  "  1  ”  r-J 

Proof :  The  proof  uses  induction  on  i.  Evidently,  for  i-1  in 


(a.l)  we  obtain 

°2  “  n  al  +  A1 

which  is  identical  to  (a. 2)  for  r-2.  Hence  assume  that  for 
any  r«l,2,...,k,  or  is  given  by  (a. 2),  then  from  (a.l)  it 

follows  that 


ffr  +  l  “  n  °r  +  Ar 


,r-l  _  ,  K1  oi-1 


-  n  [n ^  A  o.  +  £  nJ  a  . ]  + 

1  j-1  •’ 

r  r~1  j 

-  n  a.  ♦  £  nJ  .  ♦  a_ 

1  j-l  r_J  r 

r  r_1  j 

-  n  O.  +  £  (V  A 

X  j-0  3 


-  °r  +  A  n3'1  **♦!-, 

which  proves  that  of  +  ^  i*  also  given  by  (a. 2). 

Lemma  2:  let  fx  be  the  ranking  function  for  the  set  X-{xlf 
x 2 xn),  as  defined  in  section  6,  then 

min{max{Xjc  ,  fx(i-l,k-l)}  ,  fx(i,k-l)}  -  fx(i,k)  (a. 3) 
Proof :  Let  y^f  *>•  the  result  of  sorting  x^ 

xk-l  in  *8cen<*in9  order,  and  Zy  z ^  the  corresponding 
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result  for  x^  — ,  x^. 

fx(i,k-l)-yi  and  fx(i,k)-zi. 
cases : 

1)  If  x^  <  yi_1  <  y ^  then  the  left  side  of  (a. 3)  is 

min { max {x^  ,  y^}  ,  Y t)  -  min{yi_1  ,  yt)  -  yi_1 

Since  zlt ...  ,zk  are  obtained  from  y^,...ryjc_^  by  inserting 

x^  in  some  position  before  y^_^,  we  immediately  see  that 

^i-1  *  V 

2)  If  yA1  <  xjc  <  yi»"then  the  left  side  of  (a. 3)  is 

min  {max  { xR  ,  y^}  »  y*}  *  xk 

and  in  this  case  it  is  clear  that  x^  -  z ^ . 

3)  If  Yi_1  <  Y±  <  *  then  the  left  side  of  (a. 3)  is  equal 

to  y^r  which  in  turn  is  equal  to  zi  because,  in  this  case, 
x^  is  inserted  in  some  position  after  yi> 


Hence,  f x ( i-1 ,k-l) -yi_1 , 
Now  consider  the  following 
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